FoleyGAN: Visually Guided Generative Adversarial Network-Based Synchronous Sound Generation in Silent Videos

نویسندگان

چکیده

Deep learning based visual-to-sound generation systems have been developed that identify and create audio features from video signals. However, these techniques often fail to consider the time-synchronicity of visual features. In this paper we introduce a novel method for guiding class-conditioned GAN synthesize representative with temporally extracted information. We accomplish task by adapting synchronicity traits between audio-visual modalities. Our proposed FoleyGAN model is capable conditioning action sequences events leading visually aligned realistic soundtracks. expanded our previously Automatic Foley data set. evaluated FoleyGAN's synthesized sound output through human surveys show noteworthy (on average 81%) performance. approach outperforms other baseline models sets in statistical ablation experiments achieving improved IS, FID NDB scores. analysis showed significance temporal feature extraction as well augmented performance network. Overall, retrieval accuracy 76.08% surpassing existing visual-to-audio synthesis deep neural networks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adversarial Examples Generation and Defense Based on Generative Adversarial Network

We propose a novel generative adversarial network to generate and defend adversarial examples for deep neural networks (DNN). The adversarial stability of a network D is improved by training alternatively with an additional network G. Our experiment is carried out on MNIST, and the adversarial examples are generated in an efficient way compared with wildly-used gradient based methods. After tra...

متن کامل

Energy-based Generative Adversarial Network

We introduce the “Energy-based Generative Adversarial Network” model (EBGAN) which views the discriminator as an energy function that associates low energies with the regions near the data manifold and higher energies with other regions. Similar to the probabilistic GANs, a generator is trained to produce contrastive samples with minimal energies, while the discriminator is trained to assign hi...

متن کامل

Metric Learning-based Generative Adversarial Network

Generative Adversarial Networks (GANs), as a framework for estimating generative models via an adversarial process, have attracted huge attention and have proven to be powerful in a variety of tasks. However, training GANs is well known for being delicate and unstable, partially caused by its sigmoid cross entropy loss function for the discriminator. To overcome such a problem, many researchers...

متن کامل

Wasserstein Generative Adversarial Network

Recent advances in deep generative models give us new perspective on modeling highdimensional, nonlinear data distributions. Especially the GAN training can successfully produce sharp, realistic images. However, GAN sidesteps the use of traditional maximum likelihood learning and instead adopts an two-player game approach. This new training behaves very differently compared to ML learning. Ther...

متن کامل

Controllable Generative Adversarial Network

Although it is recently introduced, in last few years, generative adversarial network (GAN) has been shown many promising results to generate realistic samples. However, it is hardly able to control generated samples since input variables for a generator are from a random distribution. Some attempts have been made to control generated samples from GAN, but they have shown moderate results. Furt...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Multimedia

سال: 2022

ISSN: ['1520-9210', '1941-0077']

DOI: https://doi.org/10.1109/tmm.2022.3177894